A Randomized Exhaustive Propositionalization Approach for Molecule Classification
نویسندگان
چکیده
Drug discovery is the process of designing compounds that have desirable properties, such as activity and non-toxicity. Molecule classification techniques are used along this process to predict the properties of the compounds in order to expedite their testing. Ideally, the classification rules found should be accurate and reveal novel chemical properties, but current molecule representation techniques lead to less than adequate accuracy and knowledge discovery. This work extends the propositionalization approach recently proposed for multi-relational data mining in two ways: it generates expressive attributes exhaustively and it uses randomization to sample a limited set of complex (“deep”) attributes. Our experimental tests show that the procedure is able to generate meaningful and interpretable attributes from molecular structural data, and that these features are effective for classification purposes.
منابع مشابه
Clustering Relational Data Based on Randomized Propositionalization
Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying standard clustering algorithms like KMeans to multi-relational data. We describe how random rules are generated and then turned into boolean-valued features. Clustering generall...
متن کاملFlexible propositionalization of continuous attributes in relational data mining
In a relational database, data are stored in primary and secondary tables. Propositionalization can transform a relational database into a single attribute-value table, and hence becomes a useful technique for mining relational databases. However, most of the existing propositionalization approaches deal with categorical attributes, and cannot handle a threshold on an attribute and a threshold ...
متن کاملPropositionalization of Relational Learning: An Information Extraction Case Study
This paper develops a new propositionalization approach for relational learning which allows for efficient representation and learning of relational information using propositional means. We develop a relational representation language, along with a relation generation function that produces features in this language in a data driven way; together, these allow efficient representation of the re...
متن کاملA Link-Based Method for Propositionalization
Propositionalization, a popular technique in Inductive Logic Programming, aims at converting a relational problem into an attributevalue one. An important facet of propositionalization consists in building a set of relevant features. To this end we propose a new method, based on a synthetic representation of the database, modeling the links between connected ground atoms. Comparing it to two st...
متن کاملPropositionalization Through Relational Association Rules Mining
In this paper we propose a novel (multi-)relational classification framework based on propositionalization. Propositionalization makes use of discovered relational association rules and permits to significantly reduce feature space through a feature reduction algorithm. The method is implemented in a Data Mining system tightly integrated with a relational database. It performs the classificatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- INFORMS Journal on Computing
دوره 23 شماره
صفحات -
تاریخ انتشار 2011